首页> 外文OA文献 >Word Network Topic Model: A Simple but General Solution for Short and Imbalanced Texts

【2h】

Word Network Topic Model: A Simple but General Solution for Short and Imbalanced Texts

机译：Word网络主题模型：一种简单而通用的短语和短语解决方案不平衡的文本

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

The short text has been the prevalent format for information of Internet inrecent decades, especially with the development of online social media, whosemillions of users generate a vast number of short messages everyday. Althoughsophisticated signals delivered by the short text make it a promising sourcefor topic modeling, its extreme sparsity and imbalance brings unprecedentedchallenges to conventional topic models like LDA and its variants. Aiming atpresenting a simple but general solution for topic modeling in short texts, wepresent a word co-occurrence network based model named WNTM to tackle thesparsity and imbalance simultaneously. Different from previous approaches, WNTMmodels the distribution over topics for each word instead of learning topicsfor each document, which successfully enhance the semantic density of dataspace without importing too much time or space complexity. Meanwhile, the richcontextual information preserved in the word-word space also guarantees itssensitivity in identifying rare topics with convincing quality. Furthermore,employing the same Gibbs sampling with LDA makes WNTM easily to be extended tovarious application scenarios. Extensive validations on both short and normaltexts testify the outperformance of WNTM as compared to baseline methods. Andfinally we also demonstrate its potential in precisely discovering newlyemerging topics or unexpected events in Weibo at pretty early stages.

机译：短文本一直是近几十年来Internet信息的普遍格式，尤其是随着在线社交媒体的发展，其成千上万的用户每天都会生成大量的短消息。尽管短文本传递的复杂信号使其成为主题建模的有希望的来源，但其极端的稀疏性和不平衡性给传统主题模型（如LDA及其变体）带来了前所未有的挑战。为了在短文本中提供一种简单但通用的主题建模解决方案，我们提出了一个基于单词共现网络的模型WNTM来同时解决稀疏和不平衡问题。与以前的方法不同，WNTM对每个单词的主题分布进行建模，而不是为每个文档学习主题，这成功地提高了数据空间的语义密度，而又不会花费太多时间或空间。同时，保留在词-词空间中的丰富上下文信息也保证了其在识别具有说服力的稀有主题时的敏感性。此外，将相同的Gibbs采样与LDA配合使用可使WNTM轻松扩展到各种应用场景。与基准方法相比，对简短文本和标准文本的广泛验证证明了WNTM的出色表现。最后，我们还将展示其在相当早的阶段精确发现微博中新出现的话题或突发事件的潜力。

著录项

作者
Zuo, Yuan; Zhao, Jichang; Xu, Ke;
展开▼
作者单位

展开▼
年度 2014
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. Word network topic model: a simple but general solution for short and imbalanced texts [J] . Zuo Yuan, Zhao Jichang, Xu Ke Knowledge and information systems . 2016,第2期

机译：词网络主题模型：简短且不平衡的文本的简单但通用的解决方案
2. Relational Biterm Topic Model: Short-Text Topic Modeling using Word Embeddings [J] . Li Ximing, Zhang Ang, Li Changchun, The Computer journal . 2019,第3期

机译：关系双项主题模型：使用词嵌入的短文本主题建模
3. Topic Modeling for Short Texts via Word Embedding and Document Correlation [J] . Yi Feng, Jiang Bo, Wu Jianjun Quality Control, Transactions . 2020,第期

机译：通过Word嵌入和文档相关性为短文本建模主题建模
4. Unsupervised Topic Modeling for Short Texts Using Distributed Representations of Words [C] . Vivek Kumar Rangarajan Sridhar 1st Workshop on vector space Modeling for Natural Language Processing 2015 . 2015

机译：使用单词的分布式表示形式的短文本的无监督主题建模
5. Things and Strings and More: Improving Place Name Disambiguation from Short Texts by Combining Entity Co-Occurrence, Topic Modeling, and Word Embedding [D] . Ju, Yiting. 2017

机译：事物和字符串和更多：通过组合实体共同发生，主题建模和单词嵌入来改善从短文本的歧义
6. Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis [O] . Rania Albalawi, Tet Hin Yeap, Morad Benyoucef 2020

机译：使用短文本数据的主题建模方法：比较分析
7. Short Text Classification Based on Latent Topic Modeling and Word Embedding [O] . Peng LI, Jun-Qing HE, Cheng-Long MA 2017

机译：基于潜在主题建模和单词嵌入的简短文本分类

Word Network Topic Model: A Simple but General Solution for Short and Imbalanced Texts

摘要

著录项

相似文献

相关主题

期刊订阅